Goto

Collaborating Authors

 mobile application


A Smart Healthcare System for Monkeypox Skin Lesion Detection and Tracking

Alghoraibi, Huda, Alqurashi, Nuha, Alotaibi, Sarah, Alkhudaydi, Renad, Aldajani, Bdoor, Alqurashi, Lubna, Batweel, Jood, Thafar, Maha A.

arXiv.org Artificial Intelligence

Monkeypox is a viral disease characterized by distinctive skin lesions and has been reported in many countries. The recent global outbreak has emphasized the urgent need for scalable, accessible, and accurate diagnostic solutions to support public health responses. In this study, we developed ITMAINN, an intelligent, AI-driven healthcare system specifically designed to detect Monkeypox from skin lesion images using advanced deep learning techniques. Our system consists of three main components. First, we trained and evaluated several pretrained models using transfer learning on publicly available skin lesion datasets to identify the most effective models. For binary classification (Monkeypox vs. non-Monkeypox), the Vision Transformer, MobileViT, Transformer-in-Transformer, and VGG16 achieved the highest performance, each with an accuracy and F1-score of 97.8%. For multiclass classification, which contains images of patients with Monkeypox and five other classes (chickenpox, measles, hand-foot-mouth disease, cowpox, and healthy), ResNetViT and ViT Hybrid models achieved 92% accuracy, with F1 scores of 92.24% and 92.19%, respectively. The best-performing and most lightweight model, MobileViT, was deployed within the mobile application. The second component is a cross-platform smartphone application that enables users to detect Monkeypox through image analysis, track symptoms, and receive recommendations for nearby healthcare centers based on their location. The third component is a real-time monitoring dashboard designed for health authorities to support them in tracking cases, analyzing symptom trends, guiding public health interventions, and taking proactive measures. This system is fundamental in developing responsive healthcare infrastructure within smart cities. Our solution, ITMAINN, is part of revolutionizing public health management.


Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent

Mo, Fanglin, Chen, Junzhe, Zhu, Haoxuan, Hu, Xuming

arXiv.org Artificial Intelligence

Mobile GUI agents execute user commands by directly interacting with the graphical user interface (GUI) of mobile devices, demonstrating significant potential to enhance user convenience. However, these agents face considerable challenges in task planning, as they must continuously analyze the GUI and generate operation instructions step by step. This process often leads to difficulties in making accurate task plans, as GUI agents lack a deep understanding of how to effectively use the target applications, which can cause them to become "lost" during task execution. To address the task planning issue, we propose SPlanner, a plug-and-play planning module to generate execution plans that guide vision language model(VLMs) in executing tasks. The proposed planning module utilizes extended finite state machines (EFSMs) to model the control logits and configurations of mobile applications. It then decomposes a user instruction into a sequence of primary function modeled in EFSMs, and generate the execution path by traversing the EFSMs. We further refine the execution path into a natural language plan using an LLM. The final plan is concise and actionable, and effectively guides VLMs to generate interactive GUI actions to accomplish user tasks. SPlanner demonstrates strong performance on dynamic benchmarks reflecting real-world mobile usage. On the AndroidWorld benchmark, SPlanner achieves a 63.8% task success rate when paired with Qwen2.5-VL-72B as the VLM executor, yielding a 28.8 percentage point improvement compared to using Qwen2.5-VL-72B without planning assistance.


RiM: Record, Improve and Maintain Physical Well-being using Federated Learning

Mishra, Aditya, Lone, Haroon

arXiv.org Artificial Intelligence

In academic settings, the demanding environment often forces students to prioritize academic performance over their physical well-being. Moreover, privacy concerns and the inherent risk of data breaches hinder the deployment of traditional machine learning techniques for addressing these health challenges. In this study, we introduce RiM: Record, Improve, and Maintain, a mobile application which incorporates a novel personalized machine learning framework that leverages federated learning to enhance students' physical well-being by analyzing their lifestyle habits. Our approach involves pre-training a multilayer perceptron (MLP) model on a large-scale simulated dataset to generate personalized recommendations. Subsequently, we employ federated learning to fine-tune the model using data from IISER Bhopal students, thereby ensuring its applicability in real-world scenarios. The federated learning approach guarantees differential privacy by exclusively sharing model weights rather than raw data. Experimental results show that the FedAvg-based RiM model achieves an average accuracy of 60.71% and a mean absolute error of 0.91--outperforming the FedPer variant (average accuracy 46.34%, MAE 1.19)--thereby demonstrating its efficacy in predicting lifestyle deficits under privacy-preserving constraints.


Local Herb Identification Using Transfer Learning: A CNN-Powered Mobile Application for Nepalese Flora

Thapa, Prajwal, Sharma, Mridul, Nyachhyon, Jinu, Pandeya, Yagya Raj

arXiv.org Artificial Intelligence

Herb classification presents a critical challenge in botanical research, particularly in regions with rich biodiversity such as Nepal. This study introduces a novel deep learning approach for classifying 60 different herb species using Convolutional Neural Networks (CNNs) and transfer learning techniques. Using a manually curated dataset of 12,000 herb images, we developed a robust machine learning model that addresses existing limitations in herb recognition methodologies. Our research employed multiple model architectures, including DenseNet121, 50-layer Residual Network (ResNet50), 16-layer Visual Geometry Group Network (VGG16), InceptionV3, EfficientNetV2, and Vision Transformer (VIT), with DenseNet121 ultimately demonstrating superior performance. Data augmentation and regularization techniques were applied to mitigate overfitting and enhance the generalizability of the model. This work advances herb classification techniques, preserving traditional botanical knowledge and promoting sustainable herb utilization.


A BERT Based Hybrid Recommendation System For Academic Collaboration

N, Sangeetha, Thangaraj, Harish, Vashisht, Varun, Joshi, Eshaan, Verma, Kanishka, Katariya, Diya

arXiv.org Artificial Intelligence

Universities serve as a hub for academic collaboration, promoting the exchange of diverse ideas and perspectives among students and faculty through interdisciplinary dialogue. However, as universities expand in size, conventional networking approaches via student chapters, class groups, and faculty committees become cumbersome. To address this challenge, an academia-specific profile recommendation system is proposed to connect like-minded stakeholders within any university community. This study evaluates three techniques: Term Frequency-Inverse Document Frequency (TF-IDF), Bidirectional Encoder Representations from Transformers (BERT), and a hybrid approach to generate effective recommendations. Due to the unlabelled nature of the dataset, Affinity Propagation cluster-based relabelling is performed to understand the grouping of similar profiles. The hybrid model demonstrated superior performance, evidenced by its similarity score, Silhouette score, Davies-Bouldin index, and Normalized Discounted Cumulative Gain (NDCG), achieving an optimal balance between diversity and relevance in recommendations. Furthermore, the optimal model has been implemented as a mobile application, which dynamically suggests relevant profiles based on users' skills and collaboration interests, incorporating contextual understanding. The potential impact of this application is significant, as it promises to enhance networking opportunities within large academic institutions through the deployment of intelligent recommendation systems.


PersonaAI: Leveraging Retrieval-Augmented Generation and Personalized Context for AI-Driven Digital Avatars

Kimara, Elvis, Oguntoye, Kunle S., Sun, Jian

arXiv.org Artificial Intelligence

This paper introduces PersonaAI, a cutting-edge application that leverages Retrieval-Augmented Generation (RAG) and the LLAMA model to create highly personalized digital avatars capable of accurately mimicking individual personalities. Designed as a cloud-based mobile application, PersonaAI captures user data seamlessly, storing it in a secure database for retrieval and analysis. The result is a system that provides context-aware, accurate responses to user queries, enhancing the potential of AI-driven personalization. Why should you care? PersonaAI combines the scalability of RAG with the efficiency of prompt-engineered LLAMA3, offering a lightweight, sustainable alternative to traditional large language model (LLM) training methods. The system's novel approach to data collection, utilizing real-time user interactions via a mobile app, ensures enhanced context relevance while maintaining user privacy. By open-sourcing our implementation, we aim to foster adaptability and community-driven development. PersonaAI demonstrates how AI can transform interactions by merging efficiency, scalability, and personalization, making it a significant step forward in the future of digital avatars and personalized AI.


GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent

Zhao, Kangjia, Song, Jiahui, Sha, Leigang, Shen, Haozhan, Chen, Zhi, Zhao, Tiancheng, Liang, Xiubo, Yin, Jianwei

arXiv.org Artificial Intelligence

Nowadays, research on GUI agents is a hot topic in the AI community. However, current research focuses on GUI task automation, limiting the scope of applications in various GUI scenarios. In this paper, we propose a formalized and comprehensive environment to evaluate the entire process of automated GUI Testing (GTArena), offering a fair, standardized environment for consistent operation of diverse multimodal large language models. We divide the testing process into three key subtasks: test intention generation, test task execution, and GUI defect detection, and construct a benchmark dataset based on these to conduct a comprehensive evaluation. It evaluates the performance of different models using three data types: real mobile applications, mobile applications with artificially injected defects, and synthetic data, thoroughly assessing their capabilities in this relevant task. Additionally, we propose a method that helps researchers explore the correlation between the performance of multimodal language large models in specific scenarios and their general capabilities in standard benchmark tests. Experimental results indicate that even the most advanced models struggle to perform well across all sub-tasks of automated GUI Testing, highlighting a significant gap between the current capabilities of Autonomous GUI Testing and its practical, real-world applicability. This gap provides guidance for the future direction of GUI Agent development. Our code is available at https://github.com/ZJU-ACES-ISE/ChatUITest.


HumekaFL: Automated Detection of Neonatal Asphyxia Using Federated Learning

Zantou, Pamely, Guda, Blessed, Retta, Bereket, Inabeza, Gladys, Joe-Wong, Carlee, Gueye, Assane

arXiv.org Artificial Intelligence

Birth Apshyxia (BA) is a severe condition characterized by insufficient supply of oxygen to a newborn during the delivery. BA is one of the primary causes of neonatal death in the world. Although there has been a decline in neonatal deaths over the past two decades, the developing world, particularly sub-Saharan Africa, continues to experience the highest under-five (<5) mortality rates. While evidence-based methods are commonly used to detect BA in African healthcare settings, they can be subject to physician errors or delays in diagnosis, preventing timely interventions. Centralized Machine Learning (ML) methods demonstrated good performance in early detection of BA but require sensitive health data to leave their premises before training, which does not guarantee privacy and security. Healthcare institutions are therefore reluctant to adopt such solutions in Africa. To address this challenge, we suggest a federated learning (FL)-based software architecture, a distributed learning method that prioritizes privacy and security by design. We have developed a user-friendly and cost-effective mobile application embedding the FL pipeline for early detection of BA. Our Federated SVM model outperformed centralized SVM pipelines and Neural Networks (NN)-based methods in the existing literature


Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration

Guan, Yanchu, Wang, Dong, Wang, Yan, Wang, Haiqing, Sun, Renen, Zhuang, Chenyi, Gu, Jinjie, Chu, Zhixuan

arXiv.org Artificial Intelligence

Autonomous mobile app interaction has become increasingly important with growing complexity of mobile applications. Developing intelligent agents that can effectively navigate and interact with mobile apps remains a significant challenge. In this paper, we propose an Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models (LLMs) with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction. EBC-LLMAgent consists of three core modules: Demonstration Encoding, Code Generation, and UI Mapping, which work synergistically to capture user demonstrations, generate executable codes, and establish accurate correspondence between code and UI elements. We introduce the Behavior Cloning Chain Fusion technique to enhance the generalization capabilities of the agent. Extensive experiments on five popular mobile applications from diverse domains demonstrate the superior performance of EBC-LLMAgent, achieving high success rates in task completion, efficient generalization to unseen scenarios, and the generation of meaningful explanations.


From Lab to Pocket: A Novel Continual Learning-based Mobile Application for Screening COVID-19

Falero, Danny, Kabir, Muhammad Ashad, Homaira, Nusrat

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has emerged as a promising tool for predicting COVID-19 from medical images. In this paper, we propose a novel continual learning-based approach and present the design and implementation of a mobile application for screening COVID-19. Our approach demonstrates the ability to adapt to evolving datasets, including data collected from different locations or hospitals, varying virus strains, and diverse clinical presentations, without retraining from scratch. We have evaluated state-of-the-art continual learning methods for detecting COVID-19 from chest X-rays and selected the best-performing model for our mobile app. We evaluated various deep learning architectures to select the best-performing one as a foundation model for continual learning. Both regularization and memory-based methods for continual learning were tested, using different memory sizes to develop the optimal continual learning model for our app. DenseNet161 emerged as the best foundation model with 96.87\% accuracy, and Learning without Forgetting (LwF) was the top continual learning method with an overall performance of 71.99\%. The mobile app design considers both patient and doctor perspectives. It incorporates the continual learning DenseNet161 LwF model on a cloud server, enabling the model to learn from new instances of chest X-rays and their classifications as they are submitted. The app is designed, implemented, and evaluated to ensure it provides an efficient tool for COVID-19 screening. The app is available to download from https://github.com/DannyFGitHub/COVID-19PneumoCheckApp.